What follows is more or less a copy of the excellent code and notes by Susan Johnston (Susan.Johnston@ed.ac.uk)
Thank you Susan!
One of the most powerful functions of R is it’s ability to produce a wide range of graphics to quickly and easily visualise data. Plots can be replicated, modified and even publishable with just a handful of commands.
Making the leap from chiefly graphical programmes, such as Excel and Sigmaplot or Minitab may seem tricky. However, with a basic knowledge of R, just investing a few hours could completely revolutionise your data visualisation and workflow. Trust me - it’s worth it.
This file shows how to build different plot types using the basic (i.e. pre-installed) graphics in R, including:
height<-c(145,167,176,123,150)
weight<-c(51,63,64,40,55)
plot(height,weight)
tarsus<-read.csv("tarsus.csv")
head(tarsus)
## TarsusLength Weight
## 1 23 231
## 2 26 258
## 3 25 254
## 4 21 211
## 5 27 268
## 6 28 284
str(tarsus)
## 'data.frame': 16 obs. of 2 variables:
## $ TarsusLength: int 23 26 25 21 27 28 27 26 26 25 ...
## $ Weight : int 231 258 254 211 268 284 271 258 264 251 ...
tarsus$TarsusLength
## [1] 23 26 25 21 27 28 27 26 26 25 26 24 25 25 23 21
tarsus$Weight
## [1] 231 258 254 211 268 284 271 258 264 251 258 244 251 248 234 211
plot(tarsus$TarsusLength,tarsus$Weight)
tarsus_tab<-table(tarsus$TarsusLength)
tarsus_tab
##
## 21 23 24 25 26 27 28
## 2 2 1 4 4 2 1
plot(tarsus_tab)
barplot(tarsus_tab)
What customisations are we going to learn in this section?
For this part, we will use data on birthweight measured in male and female unicorns.
Lets read the data into R:
unicorns<-read.csv("unicorns.csv")
head(unicorns)
## birthweight sex longevity
## 1 4.478424 Male 1
## 2 5.753458 Male 0
## 3 3.277265 Male 0
## 4 3.929379 Male 0
## 5 3.972810 Male 0
## 6 4.912954 Male 0
str(unicorns)
## 'data.frame': 1000 obs. of 3 variables:
## $ birthweight: num 4.48 5.75 3.28 3.93 3.97 ...
## $ sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
## $ longevity : int 1 0 0 0 0 0 1 0 0 1 ...
We can create a basic histograms of unicorn birthweight and longevity using hist():
hist(unicorns$birthweight) # normal distribution
hist(unicorns$longevity) # poisson distribution
And we can specify the number of cells for the histogram using: breaks = N:
hist(unicorns$birthweight, breaks = 40)
hist(unicorns$birthweight, breaks = c(0,1,2,3,4,5,6,7))
hist(unicorns$birthweight, breaks = c(0,1,2,3,4,7))
Relabel the x-axis using: xlab = “Text”
hist(unicorns$birthweight, breaks = 40, xlab = "Birth Weight")
Relabel the main title using: main = “Text”
hist(unicorns$birthweight,
breaks = 40,
xlab = "Birth Weight",
main = "Histogram of Unicorn Birth Weight")
NB: In our code, the lines were starting to get quite long so we have spread it over several lines. When there is a comma, R knows that there is more information on the next line!
The y-axis stops short of the highest value in the histogram. Lets specify new limits using: ylim = c(minimum, maximum)
hist(unicorns$birthweight,
breaks = 40,
xlab = "Birth Weight",
main = "Histogram of Unicorn Birth Weight",
ylim = c(0,80))
hist(unicorns$birthweight, # x value
breaks = 40, # number of cells
xlab = "Birth Weight", # x-axis label
main = "Histogram of Unicorn Birth Weight", # plot title
ylim = c(0,80)) # limits of the y axis (min,max)
Moomins are a common pest species in Finland. We have data on their population on the island of Ruissalo from 1971 to 2000.
Which customisations will we learn here?
moomins<-read.csv("Moomin Density.csv")
head(moomins)
## Year PopSize
## 1 1971 500
## 2 1972 562
## 3 1973 544
## 4 1974 532
## 5 1975 580
## 6 1976 590
plot(moomins$Year, moomins$PopSize)
There are several types of plot within the plot function. Use “type”:
plot(moomins$Year, moomins$PopSize, type = "l") # Try "o" "p" "l" "b"
We can also change the line type using “lty”
plot(moomins$Year, moomins$PopSize, type = "l", lty = "dashed")
plot(moomins$Year, moomins$PopSize, type = "l", lty = "dotted")
The solid line looks best, so lets stick with it.
plot(moomins$Year, moomins$PopSize, type = "l")
Let’s start to add colour using “col”.
plot(moomins$Year, moomins$PopSize, type = "l", col = "red") # R Colour Chart
NB. numbers can also be used as colours!
plot(moomins$Year, moomins$PopSize, type = "l", col = 3)
Let’s make the line a little thicker using “lwd” (line width)
plot(moomins$Year, moomins$PopSize, type = "l", col = "red", lwd = 3)
Finally, lets sort out the axis titles plot title:
Also:
Is the Moomin population increasing in size?
We can add a basic linear regression to the plot and then add the regression line to the plot using “abline”
NB. we can also use lty, lwd, col here
Finally, we can add some text to the plot giving the R2 value and the P value using “text”
We need to specify the x and y coordinates for the text
plot(moomins$Year, moomins$PopSize,
type = "l",
col = "red",
lwd = 3,
xlab = "Year",
ylab = "Population Size",
main = "Moomin Population Size on Ruissalo 1971 - 2001")
fit1 <- lm (PopSize ~ Year, data = moomins)
summary(fit1)
##
## Call:
## lm(formula = PopSize ~ Year, data = moomins)
##
## Residuals:
## Min 1Q Median 3Q Max
## -109.517 -17.760 1.646 20.373 63.832
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.249e+04 1.490e+03 -15.10 5.56e-15 ***
## Year 1.167e+01 7.504e-01 15.56 2.61e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 35.58 on 28 degrees of freedom
## Multiple R-squared: 0.8963, Adjusted R-squared: 0.8926
## F-statistic: 242 on 1 and 28 DF, p-value: 2.615e-15
abline(fit1, lty = "dashed") #abline(a=intercept,b=slope)
text(x=1974,y=750,labels="R2 = 0.896\nP = 2.615e-15")
plot(moomins$Year, moomins$PopSize, # x variable, y variable
type = "l", # draw a line graphs
col = "red", # red line colour
lwd = 3, # line width of 3
xlab = "Year", # x axis label
ylab = "Population Size", # y axis label
main = "Moomin Population Size on Ruissalo 1971 - 2001") # plot title
fit1 <- lm (PopSize ~ Year, data = moomins) # carry out a linear regression
abline(fit1, lty = "dashed") # add the regression line to the plot
text(x=1974,y=750,labels="R2 = 0.896\nP = 2.615e-15") # add a label to the plot at coordinates (x,y)
What will we learn here?
R comes with many datasets preinstalled. Let’s load a dataset of flower characteristics in 3 species of Iris.
data(iris)
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
There is a lot of data here! Let’s explore using the ‘pairs’ function
pairs(iris)
This doesn’t tell us much about the species differences. We can tell R to plot using a different colour for the three species of iris:
pairs(iris, col = iris$Species)
Sepal.Length and Petal.Length look interesting! Let’s start by looking at that.
Again, we will specify colour as the Species.
plot(iris$Sepal.Length, iris$Petal.Length, col = iris$Species)
These points are difficult to see! Let’s pick some different ones using “pch”
plot(iris$Sepal.Length, iris$Petal.Length, col = iris$Species, pch = 15)
plot(iris$Sepal.Length, iris$Petal.Length, col = iris$Species, pch = "A")
pch 21:25 also specify an edge colour (col) and a background colour (bg)
plot(iris$Sepal.Length, iris$Petal.Length, col = iris$Species, pch = 21, bg = "blue")
let’s settle on solid circles (pch = 16)
plot(iris$Sepal.Length, iris$Petal.Length, col = iris$Species, pch = 16)
we can change the size of the points with “cex”
plot(iris$Sepal.Length, iris$Petal.Length, col = iris$Species, pch = 16, cex = 2)
It’s difficult to tell these points apart, so perhaps we should make a legend.
This is one of the major drawbacks with R.
iris$Species is a factor, and R will automatically order factors in alphabetical order.
levels(iris$Species)
## [1] "setosa" "versicolor" "virginica"
Therefore, setosa, versicolor and virginica will correspond to 1, 2 and 3 on the plot default colours. Keep this in mind for the final plot, where we include a legend!
plot(iris$Sepal.Length, iris$Petal.Length, # x variable, y variable
col = iris$Species, # colour by species
pch = 16, # type of point to use
cex = 2, # size of point to use
xlab = "Sepal Length", # x axis label
ylab = "Petal Length", # y axis label
main = "Flower Characteristics in Iris") # plot title
legend (x = 4.5, y = 7, legend = levels(iris$Species), col = c(1:3), pch = 16)
# legend with titles of iris$Species and colours 1 to 3, point type pch at coords (x,y)
It is also possible to specify colours in your data frame.
iris$Colour <- ""
iris$Colour[iris$Species=="setosa"] <- "magenta"
iris$Colour[iris$Species=="versicolor"] <- "cyan"
iris$Colour[iris$Species=="virginica"] <- "yellow"
plot(iris$Sepal.Length, iris$Petal.Length, col = iris$Colour, pch = 16, cex = 2)
legend(x = 4.5, y = 7, legend = c("setosa","versicolor","virginica"),col=c("magenta","cyan","yellow"), pch=16)
It would also be possible to specify lines in the legend by using “lty” instead of “pch”
plot(iris$Sepal.Length, iris$Petal.Length, col = iris$Species, pch = 16, cex = 2)
legend(4.5,7,legend=c("setosa","versicolor","virginica"),col=c(1:3),lty="solid")
We will continue to use the Iris dataset for this section
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Colour
## 1 5.1 3.5 1.4 0.2 setosa magenta
## 2 4.9 3.0 1.4 0.2 setosa magenta
## 3 4.7 3.2 1.3 0.2 setosa magenta
## 4 4.6 3.1 1.5 0.2 setosa magenta
## 5 5.0 3.6 1.4 0.2 setosa magenta
## 6 5.4 3.9 1.7 0.4 setosa magenta
boxplot(iris$Sepal.Length ~ iris$Species)
If you wish to compare the medians of the boxplot, you can use the function “notch”.
If the notches of two plots do not overlap, this is ‘strong evidence’ that the two medians differ (see ?boxplot)
boxplot(iris$Sepal.Length ~ iris$Species, notch = T)
You may have noticed that the y-axis labels are always orientated to be perpendicular to the axis. We can rotate all axis labels using “las”. Play around with different values.
boxplot(iris$Sepal.Length ~ iris$Species, notch = T, las = 1)
Let’s add in all the axis and plot labels:
boxplot(iris$Sepal.Length ~ iris$Species, notch = T, las = 1,
xlab = "Species", ylab = "Sepal Length", main = "Sepal Length by Species in Iris")
Like we can change the size of the points in the scatterplot, we can change the size of the axis labels and titles. Let’s start with “cex.lab”, which controls the axis titles:
boxplot(iris$Sepal.Length ~ iris$Species, notch = T, las = 1,
xlab = "Species", ylab = "Sepal Length", main = "Sepal Length by Species in Iris",
cex.lab=1.5)
Now we can add in “cex.axis” (changing the tickmark size) and “cex.main” (changing the plot title size)
boxplot(iris$Sepal.Length ~ iris$Species, notch = T, las = 1,
xlab = "Species", ylab = "Sepal Length", main = "Sepal Length by Species in Iris",
cex.lab = 1.5,
cex.axis = 1.5,
cex.main = 2)
As we discussed earlier, R automatically puts factors in alphabetical order. But perhaps we would prefer to list the iris species as virginica, versicolor and setosa.
First lets look at the levels of iris:
data(iris)
levels(iris$Species)
## [1] "setosa" "versicolor" "virginica"
we reorder them with the following command:
iris$Species<-factor(iris$Species, levels = c("virginica","versicolor","setosa"))
boxplot(iris$Sepal.Length ~ iris$Species, # x variable, y variable
notch = T, # Draw notch
las = 1, # Orientate the axis tick labels
xlab = "Species", # X-axis label
ylab = "Sepal Length", # Y-axis label
main = "Sepal Length by Species in Iris", # Plot title
cex.lab = 1.5, # Size of axis labels
cex.axis = 1.5, # Size of the tick mark labels
cex.main = 2) # Size of the plot title
Let’s create a new data frame with information on three populations of dragon in the UK:
Working with summaries, rather than the whole data, is a bit easier with this function.
dragons <- data.frame(TalonLength = c(20.9, 58.3, 35.5),
SE = c(4.5, 6.3, 5.5),
Population = c("England", "Scotland", "Wales")
)
Have a look at the data:
dragons
## TalonLength SE Population
## 1 20.9 4.5 England
## 2 58.3 6.3 Scotland
## 3 35.5 5.5 Wales
Let’s make our barplot.
barplot(dragons$TalonLength)
It would be better to add Titles to the x-axis:
barplot(dragons$TalonLength, names = dragons$Population)
would a box look better around this plot?
barplot(dragons$TalonLength, names = dragons$Population)
box()
not really. Let’s start again:
barplot(dragons$TalonLength, names = dragons$Population)
Let’s reorder the columns by how beautiful the dragon habitat is (from best to worst). Suppose this order is ‘Scotland, Wales, England’.
levels(dragons$Population)
## [1] "England" "Scotland" "Wales"
dragons$Population <- factor(dragons$Population, levels=c("Scotland","Wales","England"))
levels(dragons$Population)
## [1] "Scotland" "Wales" "England"
barplot(dragons$TalonLength, names = dragons$Population)
No…. it’s not working. I give up for now.
What about error bars?
This is as far as I got before I gave up:
#install.packages("gplots") ## do this in the console if necessary
library(gplots)
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
barplot(dragons$TalonLength, names = dragons$Population,
ylim=c(0,70),xlim=c(0,4),yaxs='i', xaxs='i',
main="Dragon Talon Length in the UK",
ylab="Mean Talon Length",
xlab="Country")
par(new=T)
plotCI (dragons$TalonLength,
uiw = dragons$SE, liw = dragons$SE,
gap=0,sfrac=0.01,pch="",
ylim=c(0,70),
xlim=c(0.4,3.7),
yaxs='i', xaxs='i',axes=F,ylab="",xlab="")
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "axes" is not a
## graphical parameter
box()
Lets deal with this in ggplot2!
Do it in ggplot2!
par(mfrow=c(1,2)) # number of rows, number of columns
plot(iris$Sepal.Length, iris$Petal.Length, # x variable, y variable
col = iris$Species, # colour by species
main = "Sepal vs Petal Length in Iris") # plot title
plot(iris$Sepal.Length, iris$Sepal.Width, # x variable, y variable
col = iris$Species, # colour by species
main = "Sepal Length vs Width in Iris") # plot title
par(mfrow=c(1,1)) # sets the plot window back to normal
OR
dev.off() # But this will clear your plot history.
## null device
## 1
These code code chunks show how to save a plot into your working directory as either a .png file or a pdf file
png("Sepal vs Petal Length in Iris.png", width = 500, height = 500, res = 72)
plot(iris$Sepal.Length, iris$Petal.Length,
col = iris$Species,
main = "Sepal vs Petal Length in Iris")
dev.off()
## quartz_off_screen
## 2
pdf("Sepal vs Petal Length in Iris.pdf")
plot(iris$Sepal.Length, iris$Petal.Length,
col = iris$Species,
main = "Sepal vs Petal Length in Iris")
dev.off()
## quartz_off_screen
## 2
plot(moomins$Year, moomins$PopSize, type="l")
boxplot(iris$Sepal.Length ~ iris$Species)
par(col=2,pch=16,las=1,lty="dotted") # affects all subsequent plots
plot(moomins$Year, moomins$PopSize, type="l")
boxplot(iris$Sepal.Length ~ iris$Species)
dev.off() # resets par() to default values
## null device
## 1
Now that par() has been reset to defaut values, subsequent plots have the default appearance
plot(moomins$Year, moomins$PopSize, type="l")
boxplot(iris$Sepal.Length ~ iris$Species)
par(mfrow=c(1,1))
# try these in your console window
example(plot)
example(barplot)
example(boxplot)
example(coplot)
example(hist)
example(fourfoldplot)
example(stars)
example(image)
example(contour)
example(filled.contour)
example(persp)
scatterplots for each combination of two factors
library(lattice)
xyplot(iris$Petal.Length ~ iris$Sepal.Length | iris$Species, layout = c(3,1))
histogram( ~ unicorns$birthweight | unicorns$sex, layout = c(1,2))